Skip to content

Always use an implicit done callback#7

Open
TomiBelan wants to merge 2 commits intocreationix:masterfrom
TomiBelan:nextsafe
Open

Always use an implicit done callback#7
TomiBelan wants to merge 2 commits intocreationix:masterfrom
TomiBelan:nextsafe

Conversation

@TomiBelan
Copy link
Copy Markdown
Contributor

This pull request is branched off from #6. You probably want to look at 52b86fd, not the combined diff of both this and #6. If you want to merge it without #6, tell me and I'll rebase it.

What the code does:
Remove nextPlain() and always use nextSafe(). If the user hasn't provided a done callback, use an implicit one that throws exceptions globally in a new call stack.


Tl;dr why:

var run = require('./.');

function fragileAsyncOperation(number, cb) {
  console.log("doing setup");
  process.nextTick(function () {
    cb(null, number * number);
    console.log("doing cleanup");   // I sure hope this gets executed!
  });
}

function *myProgram(gen) {
  console.log("program start");
  yield process.nextTick(gen());
  yield fragileAsyncOperation(8, gen());
  null();   // a wild type error appeared!
}

run(myProgram, function(e){ console.log(e); });     // runs cleanup :)
setTimeout(function() { run(myProgram); }, 1000);   // doesn't run cleanup :(
// (the setTimeout is there to distinguish which run is which)

Looong and detailed why:

Async functions have invariants -- rules that everyone must follow. The main rule is something like this: "If I call an async function and give it a callback, that async function will never throw anything, and it will make sure the callback gets called exactly once." I hope you'll agree that this rule is correct and useful. A lot of code implicitly assumes the rule holds. If an async function throws, or calls the callback twice, or not at all, everything breaks. (Of course, not everyone follows the rule -- that's why gen-run has protection against double calls etc. But ideally everyone would.)

To make the main rule work, you also need other rules, such as: "The callback must never throw anything either." Because what if the async function is actually synchronous:

function myAsyncFunction(..., cb) {
  cb(null, ...);
}

Or what if it uses some error handling like this one:

function myAsyncFunction(..., cb) {
  doSomethingAsynchronous(..., function (...) {
    try {
      var result = ...;   // might throw
      cb(null, result);
    } catch (e) {
      cb(e);   // oh dear -- if cb throws, we might call it twice!
    }
  });
}

Or what if the async function wants to do some cleanup after calling cb, like the example above. (Though that might be less common.)

When you call an async function and give it a callback you're actually promising to the async function that the callback is well-behaved and won't do something unexpected like throwing exceptions.

In our case, the async function is n.value and the callback we're giving it is gen(). So the rule says gen() should never throw, which means check should never throw, which means next should never throw. Which means if you ever use nextPlain, you break the main async rule. So this PR removes it and just uses nextSafe all the time.

But then, what to do when the generator actually throws an exception? We need to catch it because throwing something from gen() is not allowed by the rule, but we don't want to silently swallow it. Lacking a better option, we throw the exception globally, in a new call stack created by setTimeout(..., 0). We could've also used process.nextTick, but setTimeout also works in browsers and the distinction isn't very important in this case. (Btw, streamline.js does this too.)

This whole matter is quite complex, so if you're not convinced, let's discuss further.

@creationix
Copy link
Copy Markdown
Owner

I'm not sure how throwing exceptions that aren't catch-able is somehow better than throwing them where it's most convenient. What does that gain?

@creationix
Copy link
Copy Markdown
Owner

With regards to the rule "If I call an async function and give it a callback, that async function will never throw anything, and it will make sure the callback gets called exactly once.", I throw in async functions all the time as long as that throw is in the initial tick before the function returns. Usually this is in the form of TypeErrors because some input was invalid. I do agree that async functions should probably not throw after they return because then it's uncatchable. (though node domains change this as well)

@creationix
Copy link
Copy Markdown
Owner

To make the main rule work, you also need other rules, such as: "The callback must never throw anything either." Because what if the async function is actually synchronous:

Like I said before. I'm fine with any function throwing before it returns which would be the case of a synchronous callback throwing. You would be doing the user a disservice by catching this error and throwing it in a later tick where they can't catch it.

@TomiBelan
Copy link
Copy Markdown
Contributor Author

Thanks for your response. I see your point, and I must admit I forgot about the TypeError case. You're right, the async function can throw an exception if it's in the same tick, so the rule I wrote above is incomplete. But I still think next should never throw.

In the current HEAD, exceptions thrown from the generator bubble up the callstack and can be caught. You say this is a useful feature. I say it's a bad idea. A user can almost never rely on being consistently able to catch those exceptions.

function syncFn(cb) {
  cb(null, "a result");   // (A)
}
function asyncFn(cb) {
  process.nextTick(function () {
    cb(null, "a result");   // (B)
  });
}
function *myProgram() {
  // ???
  throw Error("boo!");
}
run(myProgram);   // (C)

The shape of the callstack depends on what's in the "???" section. If it's all synchronous (e.g. just yield syncFn; or nothing), the user can catch the exception by wrapping (C) in a try/catch. If it's asynchronous, the exception will be thrown out of cb at line (B) and can be caught there. If it's only sometimes asynchronous, e.g. if (complicatedExpression) yield asyncFn;, you can't know in advance whether the exception will 'emerge' at (B) or (C).

An exception thrown from a generator can only be caught in the last async callback that ran, or at the callsite of run() if everything was synchronous. This might not always be the same place. If you put try/catch in some of these places but not in others, the exception will be caught at some times but not at others. (And many of these async functions will be provided by libraries, worsening the problem.) That is a bad thing, and in my opinion, worse than just making it uncatchable by throwing it in a later tick. That's also a bad thing, but at least it's consistent. An exception that is caught at some times but not at others would be really hard to debug.

As a user of the library, I either want to handle all generator exceptions (with a done callback), or I want a guarantee that I'll see all of them in the error console (a rogue async function won't catch them, nor will they break any async functions that naively assume cb won't throw anything). This commit preserves those two options, and removes the current default behavior, "how an exception is handled depends on what was the last async continuable you yielded, and whether it ran cb in a try/catch block". That doesn't look like something someone would want. Making all exceptions behave consistently (either throwing them all uncatchably, or passing them all to the done callback) makes gen-run programs safer and easier to debug.

Apologies for the post length. :-) What do you think?

@creationix
Copy link
Copy Markdown
Owner

So basically you're saying it's better to make all exceptions uncatchable instead of just the ones after the first yield. Not sure I agree with that, especially since it makes the code more complex and slower to run. Both cases are pretty bad and people should be passing in a callback to catch all exceptions.

@TomiBelan
Copy link
Copy Markdown
Contributor Author

Yes, that's the gist of it. Especially because it's not "first yield", it's the first asynchronous yield, so it'd be even more confusing. (For instance, in the above example, putting a try/catch around line (A) doesn't do anything.) In an ideal world people would call run() with a done callback, but when they don't, I think the best option we have is throwing the errors in a setTimeout. Streamline.js seems to think so too.

I don't think the new code is more complex or slower to run. A single "next" function seems nicer than having two. And even though you now run everything in nextSafe's try/catch, even stuff that ran in nextPlain before, the overhead is negligible in JavaScript. See the difference between test1 and test3 in this benchmark: http://jsperf.com/try-catch-performance-overhead

Hmm... one more thing. Perhaps this shorter comment in done would be better?

// If the generator threw an exception, throw it globally with setTimeout.
// Use your own done callback to ensure you'll catch all exceptions.

@creationix
Copy link
Copy Markdown
Owner

Ok, so I can see how this does reduce the total number of lines of code. But what are we really doing here. The only changed behavior is already in a bad case. The user opted to not pass in a callback and exceptions were thrown. There is no reliable way to recover from this except for something external that has access to all the event sources like node domains.

With the original version, these exceptions may throw in the original tick if they happen soon enough and if not, they will be uncatchable. In the original version this code runs without any try..catch of any kind and gen-run generally stays out of the way and lets you shoot yourself in the foot.

With the modified version, we are putting a try..catch around all code that runs here. This has a very substantial performance cost in more complex applications because many V8 optimizations don't work in the presence of try..catch. Also we're using timers, something that has nothing to do with the problem at hand as a hacky way to make the errors as painful as possible all the time in hope the user discovers their errors sooner.

Generally, my policy whenever there is a question as which is the right thing, I do the least amount of interference possible. I don't want to be doing things on behalf of the user unless I'm 100% sure it's the right thing to do.

All users of gen-run who care about catching all their errors will pass in a callback. If we were so concerned about hand-holding we would simply make the callback required and throw a TypeError if it's omitted. The no-callback case is for code that doesn't care about what happens in case of an error and is fine with the process crashing. The initial tick is probably not surrounded by a try..catch either.

Another use case is the user is using domains or something like that and uncaught exceptions aren't dangerous. In that case they do know what they are doing and us wrapping in try..catch and setTimeout only gets in the way and slows things down that would otherwise work just fine.

@TomiBelan
Copy link
Copy Markdown
Contributor Author

Users who don't pass a callback to run() are generally ok with the process crashing, and they probably don't put run() inside a try/catch either -- I agree. But even if they want the process to crash, the async function can stupidly prevent it. For example, I've seen code like this in real libraries: try { ...; cb(null, result); } catch (e) { cb(e); } If cb throws, the async function just calls it again, which fails the "done" check and the exception gets swallowed silently in resume(). This might not be a common occurence, but I'd feel safer if the yielded async functions didn't have any way to influence this.

Could you point me to more information about the performance cost of try/catch in V8? All resources I found imply that even if the function using try/catch is not optimized, other functions it calls are ok. So next would be slower, but it's small anyway, and the generator would be fast as usual. Look at test2 vs test3 in the benchmark I posted. But if running the generator inside try/catch really is slower, I agree that would be a blocker for this PR.

@creationix
Copy link
Copy Markdown
Owner

I think doing less in this library is better. If you want consistenty async behavior as opposed to the natural behavior of generators to run sync till their first non-blocking yield, you can externally cal run in a new event.

To make this easier, I think I'd be find with converting gen-run to a continuable style API. I fear people will abuse this and nest them improperly, but whatever, that's not my problem.

This would mean the callback is almost required. The three ways to run it would be:

// Normal callback-last mode
run(generator, callback);

// Continuable mode
run(generator)(callback);

// Bare mode
run(generator()();

Then if you truly wanted to crash on all errors you could do something like:

setTimeout(run(generator), 0);

https://github.com/creationix/gen-run/compare/catchy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants